ATOM Documentation

← Back to App

Architecture Overview

Complete system architecture for ATOM SaaS - a multi-tenant AI agent platform with cognitive architectures, learning engines, and enterprise-grade governance.

---

High-Level Architecture

ATOM SaaS follows a layered architecture with clear separation of concerns:

---

Technology Stack

Frontend (Presentation Layer)

**Web Application:**

  • **Framework:** Next.js 14 (App Router)
  • **Language:** TypeScript 5.x
  • **UI Library:** React 18
  • **Styling:** Tailwind CSS
  • **Components:** Radix UI primitives
  • **Editor:** Monaco (VS Code editor)
  • **State:** React Context + Server Components

**Desktop Application:**

  • **Framework:** Tauri 2.0
  • **Language:** Rust (backend), JavaScript (frontend)
  • **Features:** Terminal access, Docker integration, local execution
  • **Security:** Sandboxed execution with permission prompts

Backend (API Layer)

**Unified Backend:**

  • **Runtime:** Managed Compute Node running dual processes via supervisord
  • **Frontend Port:** 3000 (Next.js)
  • **Backend Port:** 8000 (FastAPI)
  • **Internal Comm:** Next.js proxies /api/v1 requests to local FastAPI instance

Data Layer

**Primary Database:**

  • **Database:** PostgreSQL 15+
  • **Extension:** pgvector (vector similarity)
  • **Security:** Row-Level Security (RLS) for tenant isolation
  • **Hosting:** Neon PostgreSQL (serverless)

**Vector Database:**

  • **Database:** LanceDB
  • **Purpose:** Semantic search for World Model
  • **Storage:** Local file system (persistent volumes)

**Caching:**

  • **Cache:** Redis
  • **Purpose:** Rate limiting, session caching, pub/sub
  • **Hosting:** Upstash Redis

**File Storage:**

  • **Storage:** AWS S3
  • **Purpose:** User uploads, agent artifacts, canvas exports
  • **Isolation:** Tenant-specific prefixes (s3://atom-saas/{tenant_id}/)

Infrastructure

**Hosting:**

  • **Platform:** ATOM Cloud Platform
  • **Regions:** Multiple regions for low latency (Anycast network)
  • **Features:** Auto-scaling, health checks, rolling deployments

**CI/CD:**

  • **Pipeline:** GitHub Actions
  • **Testing:** 212 E2E tests (100% compliance)
  • **Deployment:** Automated on merge to main

---

Brain Systems Architecture

The brain systems are the core intelligence layer that enables human-like agent behavior:

Brain System Responsibilities

**1. Cognitive Architecture**

  • Human-like reasoning process
  • Attention allocation
  • Memory recall coordination
  • Language processing
  • Problem-solving strategies

**2. Learning Engine**

  • Experience recording (RLHF)
  • Pattern recognition
  • Adaptation generation
  • Behavior modification
  • Performance optimization

**3. World Model**

  • Long-term memory storage
  • Semantic similarity search
  • Experience recall by relevance
  • Canvas context tracking
  • Feedback-aware retrieval

**4. Reasoning Engine**

  • Proactive intelligence
  • Intervention generation
  • Opportunity identification
  • Automation suggestions
  • Trend analysis

**5. Cross-System Reasoning**

  • Multi-agent coordination
  • Cross-system data correlation
  • Complex problem decomposition
  • Knowledge synthesis

**6. Alpha Evolver**

  • Autonomous code mutation
  • Sandbox-based variant testing
  • Workflow performance optimization
  • Self-improving toolsets

**7. Agent Governance**

  • Permission validation
  • Maturity Calibration (AI-driven)
  • Safety checks
  • Audit logging
  • Rate limiting

**Detailed Brain Systems →**

---

Multi-Tenancy Architecture

Tenant isolation is implemented at multiple layers for enterprise-grade security:

Tenant Isolation Layers

**1. Subdomain Routing**

  • Each tenant gets unique subdomain: tenant.atomagentos.com
  • Custom domains supported
  • Subdomain mapped to tenant_id in database

**2. Row-Level Security (RLS)**

-- RLS Policy Example
ALTER TABLE agents ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON agents
  FOR ALL
  USING (tenant_id = current_setting('app.current_tenant_id')::UUID);

**3. S3 Prefix Isolation**

  • Each tenant gets dedicated S3 prefix
  • Path format: s3://atom-saas/{tenant_id}/uploads/
  • Bucket policies enforce prefix access

**4. Redis Namespace**

  • Keys namespaced: tenant:{tenant_id}:rate_limit
  • Pub/sub channels scoped: tenant:{tenant_id}:events
  • Session isolation guaranteed

**5. Application-Level Filtering**

  • All queries include WHERE tenant_id = ?
  • API responses filter tenant data
  • Background jobs scoped to tenant

**Detailed Multi-Tenancy →**

---

Agent Execution Flow

Complete request lifecycle from user input to agent response:

Execution Stages

**1. Request Validation**

  • Authenticate user session
  • Extract tenant context
  • Validate request schema

**2. Governance Checks**

  • Rate limit validation (per-tenant)
  • Permission check (agent maturity)
  • Safety guardrails

**3. Context Resolution**

  • Load agent configuration
  • Resolve task context
  • Fetch relevant settings

**4. Cognitive Processing**

  • Recall relevant experiences (World Model)
  • Generate reasoning chain
  • Determine optimal approach

**5. Skill Execution**

  • Load required skills
  • Execute actions
  • Handle integration calls

**6. Learning & Recording**

  • Record experience to World Model
  • Extract learnings
  • Update patterns

**7. Response Generation**

  • Format response
  • Include metadata
  • Return to user

---

Data Flow Diagrams

Agent Creation Flow

Graduation Exam Flow

Skill Execution Flow

---

Security Architecture

Multiple security layers protect tenant data and ensure safe agent behavior:

Security Layers

**1. Network Security**

  • TLS 1.3 for all connections
  • DDoS protection (Global edge network)
  • IP whitelisting (enterprise)

**2. Authentication**

  • JWT-based sessions
  • OAuth 2.0 for integrations
  • API key support (BYOK)

**3. Tenant Isolation**

  • Subdomain-based routing
  • Row-Level Security (PostgreSQL)
  • Storage prefix isolation
  • Cache namespace separation

**4. Agent Governance**

  • Maturity-based permissions
  • Real-time permission validation
  • Constitutional guardrails
  • Comprehensive audit logging

**5. Abuse Protection**

  • Per-tenant rate limits
  • Resource quotas (storage, API calls)
  • Anomaly detection
  • Automatic throttling

---

Scalability Architecture

Horizontal and vertical scaling strategies:

Horizontal Scaling

**Auto-Scaling:**

  • CPU-based scaling triggers
  • Memory-based scaling triggers
  • Request queue-based scaling
  • Regional distribution

Vertical Scaling

**Database:**

  • Connection pooling (PgBouncer)
  • Read replicas for analytics
  • Partitioned tables (by tenant)
  • Index optimization

**Cache:**

  • Redis cluster for high availability
  • Tiered caching (L1: memory, L2: Redis)
  • Intelligent cache invalidation

---

Monitoring & Observability

**Detailed Monitoring →**

---

Technology Rationale

Why Next.js?

  • React Server Components for performance
  • Built-in API routes for backend logic
  • Excellent developer experience
  • Strong TypeScript support
  • SEO optimization

Why FastAPI?

  • Native async support
  • Automatic OpenAPI documentation
  • High performance (comparable to Node.js)
  • Strong type validation (Pydantic)
  • Easy testing

Why PostgreSQL?

  • ACID compliance
  • Row-Level Security
  • pgvector for vector similarity
  • Excellent reliability
  • Strong ecosystem

Why Neon?

  • Serverless PostgreSQL
  • Auto-scaling storage
  • Branch-based development
  • Built-in connection pooling
  • Competitive pricing

Why LanceDB?

  • Embedded vector database
  • High-performance semantic search
  • Python-native
  • No separate infrastructure
  • Open source

Why Redis?

  • In-memory performance
  • Rich data structures
  • Pub/sub support
  • Rate limiting capabilities
  • Session management

Why ATOM Managed Infrastructure?

  • Simple deployment model
  • Built-in load balancing
  • Multi-region support
  • Integrated security
  • Optimized performance

---

Architecture Patterns Used

1. Layered Architecture

  • Clear separation of concerns
  • Each layer has specific responsibility
  • Easy to test and maintain

2. Event-Driven Architecture

  • Agent executions trigger events
  • Background jobs process asynchronously
  • Real-time updates via pub/sub

3. Multi-Tenancy Patterns

  • Subdomain-based routing
  • Row-Level Security
  • Tenant-scoped caching
  • Isolated storage

4. Plugin Architecture

  • Skill registry for dynamic loading
  • Integration adapters
  • Extensible brain systems

5. CQRS (Command Query Responsibility Segregation)

  • Separate read and write models
  • Optimized for each use case
  • Complex queries use read replicas

---

Performance Considerations

Database Optimization

  • Connection pooling (max 20 connections)
  • Read replicas for analytics queries
  • Indexed foreign keys
  • Partitioned tables by tenant

Caching Strategy

  • L1 cache: In-memory (frequently accessed)
  • L2 cache: Redis (shared across instances)
  • Cache TTL: 5-60 minutes depending on data
  • Invalidation on updates

API Performance

  • Response time target: < 200ms (p95)
  • Rate limits: 50/day (free), 5000/day (team)
  • Pagination for large result sets
  • Compression enabled (gzip)

Background Jobs

  • Async task processing
  • Job queues (Redis-based)
  • Automatic retries with exponential backoff
  • Dead letter queue for failed jobs

---

Next Steps

**Explore Specific Systems:**

**Implementation Guides:**

---

**Last Updated:** 2025-02-06

**Architecture Version:** 8.0 (Production Ready)